Steps in Move 3 |
Title: Pitfalls in Corpus Research
Author(s): TONI RIETVELD, ROELAND VAN HOUT and MIRJAMERNESTUS
Journal: Computers and the Humanities?38?(2004).? |
|
Describe the present research + outline the structure of this paper |
10. In Section 2, we start with transcription and coding, where conflicting judgments between experts or evaluators quite often show up.
|
|
11. The degree of conflict can be made clear by calculating agreement indices. |
Describe the present research |
12. Moreover, we will show how data on which disagreement occurs ought to be dealt with in the analysis. |
Outline the structure of this paper |
13 . The statistical analysis of frequency data is the central topic of Section 3. |
|
14 . Basically, the analysis of this type of data is fairly straightforward. |
|
15 . The primary technique is v2 analysis, a technique explained in introductory textbooks on statistics. |
|
16 . An important assumption of v2 analysis and equivalent statistics is the independence of observations, |
|
16.1. and precisely this assumption is problematic in corpus research. |
|
17 . We show how two kinds of dependences may interfere in the statistical analysis, both resulting in a Type I error which is too high; |
|
17.1. (that is to say that) the significance of an effect is claimed too often where in fact there is no effect. |
Describe the present research + outline the structure of this paper |
18 . Section 4 deals with two other well-known problems in v2 analysis, viz. the effects of small and large samples. |
|
19. Small samples tend to yield few significant effects, |
|
19.1. while the ‘high significance’ levels obtained with large samples are often incorrectly interpreted as indicators of substantial effects. |
|
20. For small samples the concept of power is relevant. |
|
21. For large samples, we need an index which expresses the size of an effect, independently from the sample size. |
Describe the present research + outline the structure of this paper |
22. In Section 5, we discuss the use of the log odds ratio as an alternative to v2 analysis. |
|
23. Its use is still quite rare in corpus analysis, |
|
23.1. although it has outstanding statistical properties. |
|
24. Log odds form the basis of attractive multivariate techniques, such as logit analysis and logistic regression. |